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^ ■ Abstract 

(N : 

O ' Recent attention in quickest change detection in the multi-sensor setting has been on the case where 

, the densities of the observations change at the same instant at all the sensors due to the disruption. In 

Q>^ ■ this work, a more general scenario is considered where the change propagates across the sensors, and 

' its propagation can be modeled as a Markov process. A centralized, Bayesian version of this problem, 

^ I I with a fusion center that has perfect information about the observations and a priori knowledge of 
I 

. , the statistics of the change process, is considered. The problem of minimizing the average detection 

O . delay subject to false alarm constraints is formulated as a partially observable Markov decision process 

(POMDP). Insights into the structure of the optimal stopping rule are presented. In the limiting case of 

\ rare disruptions, we show that the structure of the optimal test reduces to thresholding the a posteriori 

, probability of the hypothesis that no change has happened. We establish the asymptotic optimality 

^■f-^ . (in the vanishing false alarm probability regime) of this threshold test under a certain condition on the 

CsJ ■ Kullback-Leibler (K-L) divergence between the post- and the pre-change densities. In the special case of 

' near-instantaneous change propagation across the sensors, this condition reduces to the mild condition 

^P. I that the K-L divergence be positive. Numerical studies show that this low-complexity threshold test 
> ■ 

•rH , results in a substantial improvement in performance over naive tests such as a single-sensor test or a 

• test that wrongly assumes that the change propagates instantaneously. 
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I. Introduction 

An important application area for distributed decision-making systems is in environment 
surveillance and monitoring. Specific applications include: i) Intrusion detection in computer 
networks and security systems [2], [3], ii) monitoring cracks and damages to vital bridges 
and highway networks [4], iii) monitoring catastrophic faults to critical infrastructures such as 
water and gas pipelines, electricity connections, supply chains, etc. [5], iv) biological problems 
characterized by an event-driven potential including monitoring human subjects for epileptic 
fits, seizures, dramatic changes in physiological behavior, etc. [6], [7], v) dynamic spectrum 
access and allocation problems [8], vi) chemical or biological warfare agent detection systems 
to protect against terrorist attacks, vii) detection of the onset of an epidemic, and viii) failure 
detection in manufacturing systems and large machines. In all of these applications, the sensors 
monitoring the environment take observations that undergo a change in statistical properties in 
response to a disruption (change) in the environment. The goal is to detect the point of disruption 
(change-point) as quickly as possible, subject to false alarm constraints. 

In the standard formulation of the change detection problem, studied over the last fifty years, 
there is a sequence of observations whose density changes at some unknown point in time and 
the goal is to detect the change-point as soon as possible. Two classical approaches to quickest 
change detection are: i) The minimax approach [9], [10], where the goal is to minimize the 
worst-case delay subject to a lower bound on the mean time between false alarms, and ii) The 
Bayesian approach [11]-[13], where the change-point is assumed to be a random variable with a 
density that is known a priori and the goal is to minimize the expected (average) detection delay 
subject to a bound on the probability of false alarm. Significant advances in both the minimax and 
the Bayesian theories of change detection have been made, and the reader is referred to [9]-[22] 
for a representative sample of the body of work in this area. The reader is also referred to [9], 
[16], [18], [22]-[27] for performance analyses of the standard change detection approaches in 
the minimax context, and [28], [29] in the Bayesian context. 

Extensions of the above framework to the multi-sensor case where the information available for 
decision-making is distributed has also been explored [29]-[32]. In this setting, the observations 
are taken at a set of L distributed sensors, as shown in Fig. [T] The sensors may send either 
quantized/unquantized versions of their observations or local decisions to di fusion center, subject 
to communication delay, power and bandwidth constraints, where a final decision is made, based 
on all the sensor messages. In particular, in recent work [29]-[32], it is assumed that the statistical 
properties of all the sensors' observations change at the same time. However, in many scenarios, 
it is more suitable to consider the case where the statistics of each sensor's observations may 
change at different points in time. An application of this model is in the detection of pollutants and 
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Fig. 1. Change-point detection across a linear array of sensors. 



biological warfare agents, where the change process is governed by the movement of the agent 
through the medium under consideration. Numerous other examples, including those described 
earlier, can be modeled in the change process detection framework. 

We consider a Bayesian version of this problem and assume that the point of disruption (that 
needs to be detected) is a random variable with a geometric distribution. We assume that the L 
sensors are placed in an array or a line and they observe the change as it propagates through 
them. We model the inter-sensor delay with a Markov model and in particular, the focus is on 
the case where the inter-sensor delay is also geometric. More general inter-sensor delay models 
can be considered, but the case of a geometric prior has an intuitive and appealing interpretation 
due to the memorylessness property of the geometric random variable. 

We study the centralized case, where the fusion center has complete information about the 
observations at all the L sensors, the change process statistics, and the pre- and the post-change 
densities. This is applicable in scenarios where: i) the fusion center is geographically collocated 
with the sensors so that ample bandwidth is available for reliable communication between the 
sensors and the fusion center; and ii) the impact of the disruption-causing agent on the statistical 
dynamics of the change process and the statistical nature of the change so induced can be 
modeled accurately. 

Summary of Main Contributions: The goal of the fusion center is to come up with a strategy 
(or a stopping rule) to declare change, subject to false alarm constraints. Towards this goal. 



we first show that the problem fits the standard partially observable Markov decision process 
(POMDP) framework [33] with the sufficient statistics given by the a posteriori probabilities of 
the state of the system conditioned on the observation process. We then establish a recursion for 
the sufficient statistics, which generalizes the recursion established in [32] for the case when all 
the sensors observe the change at the same instant. 

Following the logic of [34] and [32], we then establish the optimality of a more general 
stopping rule for change detection. This rule takes the form of the smallest time of cross-over 
(intersection) of a linear functional (or hyperplane) in the space of sufficient statistics with a 
non-linear concave function, and generalizes the threshold test of [32]. While further analytical 
characterization of the optimal stopping rule is difficult in general, in the extreme scenario of 
a rare disruption regime, we show that the structure of this rule reduces to a simple threshold 
test on the a posteriori probability that no change has happened. This low-complexity test is 
denoted as ua (corresponding to an appropriate choice of threshold A) for simplicity. 

While ua is obtained as a limiting form of the optimal test, it is not clear (as yet) if it is 
a "good" test. Towards this goal, we show that it is asymptotically optimal (as the false alarm 
probability Pfa vanishes) under a certain condition on the KuUback-Leibler (K-L) divergence 
between the post- and the pre-change densities. Meeting this condition becomes more easier as 
change propagates more instantaneously across the sensor array, and in the extreme case of [32], 
this condition reduces to the mild one that the K-L divergence be positive. 

The difference between the setting in this work and the setting in [32] is in the non-asymptotic, 
but small PpA regime. Asymptotic optimality of a particular test in the setting of [32] translates to 
an L-fold increase in the slope of Eqd vs. Pfa in the regime where the false alarm probability 
is small, but not vanishing (e.g., Pfa ~ 10^^ or 10^^). However, if the change propagates 
too "slowly" across the sensor array, numerical studies indicate that not all of the L sensors' 
observations may contribute to the performance of ua in this regime. Nevertheless, as Pfa 0, 
all the L sensors are expected (in general) to contribute to the slope. 

Thus, while it is not clear if ua is asymptotically optimal in general, or even if all the sensors' 
observations contribute to its performance in the non-asymptotic regime, numerical studies also 
show that it can result in substantial performance improvement over naive tests such as the 
single sensor test (where only the first sensor's observation is used in decision-making) or 
the mismatched test (where all the sensors' observations are used in decision-making, albeit 
with a wrong model that change propagates instantaneously), especially in regimes of practical 
importance (rare disruption, and reasonably quick, but non-instantaneous change propagation 
across the sensors). The performance improvement possible with va, in addition to its low- 
complexity, make it an attractive choice for many practical applications with a basis in multi- 



sensor change process detection. 

Organization: This paper is organized as follows. The change process detection problem is 
formally set-up in Section [III In Section Unl this problem is posed in a POMDP framework and 
the sufficient statistics of the dynamic program (DP) are identified. Recursion for the sufficient 
statistics are then established. The structure of the optimal stopping rule in the general case and 
the rare disruption regime are illustrated in Section |Wl The limiting form of the optimal test is 
denoted as ua for simplicity. Using elementary tools from renewal theory, asymptotic optimality 
of va is established in Sections IVl - IVIII under certain conditions. (The main results are stated in 
Sec. |V] and they are established in detail in the appendices and in Sec. IVTland I VIII ) A discussion 
of the main results and numerical studies to illustrate our results are provided in Section IVIIII 
Concluding remarks are made in Section UXl 

II. Problem Formulation 

Consider a distributed system with an array of L sensors, as in Fig. [B that observes an L- 
dimensional discrete-time stochastic process = [Zk,i, ■ ■ ■ , Z^^l], where is the observation 
at the i-th sensor and the A;-th time instant. A disruption in the sensing environment occurs at 
the random time instant Fi and hence, the densitjil] of the observations at each sensor undergoes 
a change from the null density /o to the alternate density /i. 

Change Process Model: Previous works on quickest change detection in multi-sensor systems 
consider strategies to detect the change-point, Fi, when the change occurs at the same instant 
across all the sensors [29]-[32]. As described in the introduction, it is useful to consider more 
general scenarios where there exists random propagation delays in the change-point across the 
sensors. 

In this work, we consider a change process where the change-point evolves across the sensor 
array. In particular, the change-point as seen by the i-th sensor is denoted as F^. We assume 
that the evolution of the change process is Markovian across the sensors. That is, 

^({r<?i+^2+<?3 = mi + 1712 + m-i}\{Ti^+e2 = rui + 1112} , {T = rui}) 
= P({F£j+^2+£3 = mi +m2 + m3}|{F£^+^2 = mi + 7712}) 

for all ii and m, > 0, i = 1, 2, 3. Further simplification of the analysis is possible under a 
joint-geometric model on {F^}. Under this model, the change-point (Fi) evolves as a geometric 
random variable with parameter p, and inter-sensor change propagation is modeled as a geometric 



'We assume that the pre-change (/o) and the post-change (/i) densities exist. 



random variable with parameter {p£_i £ = 2, ■ ■ ■ , L}. That is, 

P{{T'i = m}) = p (1 — p)"* , m > and 
P({r£ = mi + m2}|{r£_i = m2}) = p£_i,^ (1 - p£_i,^)'"S mi > 

independent of m2 > for all £ such that 2 < £ < L. 

We will find it convenient to set po,i = p and Pl,l+i = so that p£--i,^ is defined for all 
i = 1, ■ ■ ■ , L + 1. This is also consistent with an equivalent (L + 2)-sensor system where sensor 
indices run through {£ = 0, ■ ■ ■ , L + 1}. The hypothetical zero-th sensor models the disruption 
point, the first real sensor observes change with respect to the zero-th sensor with a geometric 
parameter p (and so on). The hypothetical (L + l)-th sensor models an "observer at infinity'l^ 
that observes change from the L-th sensor with an infinite delay on average. This is reflected 
by setting Pl,l+i = 0. At this point, it should be noted that [29]-[32] consider this equivalent 
framework explicitly by modeling 7, the probability that the disruption took place before the 
observations were made. The setup in [29]-[32] can be obtained by setting: 

P{{To < 0}) = 7 and P{{To = 0}) = 1 - 7 for some 7 G [0, 1]. 

In this work, we focus on the case where 7 = with extension to the general case being 
straightforward. 

While a joint-geometric model is consistent with the Markovian assumption as only the 
inter-sensory (one-step) propagation parameters are modeled, the change-points at the individual 
sensors themselves are not geometric. For example, it can be checked that 

= ((i-pi2r-(i-pr) 

p - pi,2 V / 

PPl,2p2,3 ^ 

(P - P1,2)(P1,2 - P2,3)(P - P2,3) 

((P - Pl.2)(l - P2,3)"+' - (P - P2,3)(l - Pl,2)'"+' + (Pl,2 - P2,3)(l ' p)'"^') , 

and so on. It should be clear from the above expressions that a joint-geometric model does not 
impose any constraints on {p^-i,^} except that p^_i/ E [0, 1]. 

Note that p — > 1 corresponds to the case where instantaneous disruption (that is, the event 
{Fi = 0}) has a high probability of occurrence. On the other hand, p ^ uniformizes the 
change-point in the sense that the disruption is equally likely to happen at any point in time. 
This case where the disruption is "rare" is of significant interest in practical systems [16], 
[19], [29]-[32]. This is also the case where we will be able to make insightful statements 
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^"Observer at infinity" interpretations are often used in distributed decision-making and stochastic control problems [33], [34]. 



about the structure of the optimal stopping rule. Similarly, we can also distinguish between two 
extreme scenarios at sensor £ depending on whether pi-i,e. — or pi-i/ 1. The case where 
Pi-i/ 1 corresponds to instantaneous change propagation at sensor i and {F^ = r^_i} with 
high probability. The case where pe-i/ corresponds to uniformly likely propagation delay. 
The widely-used assumption [29], [32] of instantaneous change propagation across sensors is 
equivalent to assuming pi-i/ = 1 for all £ = 2, • • ■ , L. 



Observation Model: To simn 
are independent, conditioneq^ 



ify the study, we assume that the observations (at every sensor) 
on the change hypothesis corresponding to that sensor, and are 



identically distributed pre- and post-change, respectively. That is, 

i.i.d. /o if A; < T^, 



i.i.d. /i if > r^. 

We will describe the above assumption as that corresponding to an "i.i.d. observation process." 
Let -D(/i, /o) denote the KuUback-Leibler divergence between /i and /q. That is, 

D(/i,/o)= [ \og(^)h{x)dx. (1) 



We also assume that the measure described by /o is absolutely continuous with respect to that 
described by fi. That is, if = for some x, then fo{x) = 0. This condition ensures that 
E, 



l/i 



/o(») 



1. 

Performance Metrics: We consider a centralized, Bayesian setup where a fusion center has 
complete knowledge of the observations from all the sensors, h = {^i, ■ ■ ■ , Zk], in addition 
to knowledge of statistics of the change process (equivalently, {p£_i/}) and statisticj^ of the 
observation process (equivalently, /o and /i). The fusion center decides whether a change has 
happened or not based on the information, Jj., available to it at time instant k (equivalently, it 
provides a stopping rule or stopping time r). 

The two conflicting performance measures for quickest change detection are the probability 
of false alarm, Pfa — P(^{t < Fi}), and the expected detection delay, -Edd — E\{t — ri)"*"], 
where x+ = max(a;, 0). This conflict is captured by the Bayes risk, defined as, 

R{c) ^ PFA + c^DD = ^[]l({r<ri})+c(r-ri)+] 

for an appropriate choice of per-unit delay cost c, where l({-}) is the indicator function of the 
event {•}. We will be particularly interested in the regime where c — > 0. That is, a regime where 



^More general observation (correlation) models are important in practical settings. This will be the subject of future work. 
''We assume that the fusion center has knowledge of /o and /i so that it can use this information to declare that a change 
has happened. Relaxing this assumption is important in the context of practical applications and is the subject of current work. 



minimizing Pfa is more important than minimizing Edd, or equivalently, the asymptotics where 
Pfa ^ 0. 

The goal of the fusion center is to determine 

Topt = arg inf Edd{t) 

from the class of change-point detection procedures Aq, = |r : Pfa{t) < a} for which the 
probability of false alarm does not exceed a. In other words, the fusion center needs to come up 
with a strategy (a stopping rule r) to minimize the Bayes risk. Note that the strategy developed 
by optimizing the Bayes risk can also be used for the other classical problem formulation in 
change detection, that of the minimax type [32, Theorem 1], [13], [33]. 

III. Dynamic Programming Framework 
It is straightforward to check that [13, pp. 151-152], [32] the Bayes risk can be written as 

X:P({r,<fc}) . 

.k=0 

Towards solving for the optimal stopping time, we restrict attention to a finite-horizon, say the 
interval [0,T], and proceed via a dynamic programming (DP) argument. 

The state of the system at time k is the vector Sk — [Sk,i, ■ ■ • , Sk,L\ with Sk,e denoting the 
state at sensor £. The state Sk,e can take the value 1 (post-change), (pre-change), or t (terminal). 
The system goes to the terminal state t, once a change-point decision r has been declared. The 
state evolves as follows: 

Sk,e = l{{re<k}n{Sk-i,e^t}n{r^k})+tmSk-i,i = t}U{T = k}) 

with So — 0. Since Sk-i captures the information contained in {Te < j} for < j < k — 1 
and all £, given Sk-i, {^e < k} is independent of {F^ < j, j < k — 1} for all £. Thus, the state 
evolution satisfies the Markov condition needed for dynamic programming. 

The state is not observable directly, but only through the observations. The observation 
equation can be written as 

Zk,e = vi^;''h{{S,,e t}) + = 0), ^ > 1 

where V^^^ and V^^^ are the k-th samples from independently generated infinite arrays of 
i.i.d. data according to /o and /i, respectively. When the system is in the terminal state, the 
observations do not matter (since a change decision has already been made) and are hence 
denoted by a dummy random variable, ^. It is clear that the observation uncertainty {V^*^^ , V^^^) 
satisfies the necessary Markov conditions for dynamic programming since they are i.i.d. in time. 



R{c) = P{{Ti > t}) + cE 



Finally, the expected cost (Bayes risk) can be expressed as the expectation of an additive cost 
over time by defining 

gk{Sk)=cl{{Sk,i = l}) 

and a terminal cost 1 ({5*^,1 = 0}). Thus the problem fits the standard POMDP framework with 
termination [33], with the sufficient statistic (belief state) being given by 

P{{Sk = Sk}\Ik), 

where 1^ = {Zi, . . . , Z^} for k such that Sk 7^ t, i.e., Sk/ E {0, 1} for each i. Note that this 
sufficient statistic is described by 2^ conditional probabilities, corresponding to the 2^ values 
that Sk can take. We will next see that this sufficient statistic can be further reducecjfl to only L 
independent probability parameters in the general case. 

The fusion center determines r and hence, the minimum expected cost-to-go at time k for 
the above DP problem can be seen to be a function of Ik- For a finite horizon T, the cost-to-go 
function is denoted as Jkih) and is of the form (see [32], [33, p. 133] for examples of similar 
nature): 

J^ilr) = P{{T,>T}\It) 

Jj(4) = min|p({ri>A;}|4), cP({ri<A;}|4)+E 

where Jq is the empty set. The first term in the above minimization corresponds to the cost 
associated with stopping at time k, while the second term corresponds to the cost associated 
with proceeding to time k+1 without stopping. The minimum expected cost for the finite-horizon 
optimization problem is Jq^Iq). 

Recursion for the Sufficient Statistics: Consider the special case where change at all the sensors 
happens at the same instant. In this setting, it can be shown that the random variable pk = 
P{{Ti < k}\lk) serves as the sufficient statistic for the above dynamic program and affords 
a recursion [32]. To consider the more general case, we define an (L + l)-tuple of conditional 
probabilities, {pk,e, i = I, - ■ ■ ,L + 1}: 

Pk/ = P(^{Ti<k,---,Ti_i<k,Ti>k,---,TL>k}\lky 

The special setting of [32] is then equivalent to 

Pk,L+i = Pk, Pk,i = l-Pk, and pk,i = 0, £ = 2, ■ ■ ■ , L. 

^This should not be entirely surprising since there exists a "natural" ordering on the sensors' change-points. They can be 
arranged in non-decreasing order: Fe > Te-i for all £. The primary reason for such an ordering to exist is that we assume an 
array (or line) of sensors in this work. Extensions to more general (or unknown) geometries of sensors is of interest in practice. 



Jk+1 i^i 



0< k<T 



We now show that p^. = [pk.i, ■ ■ ■ , Pk,L+i] can be obtained from Pi._i via a recursive approach. 
For this, we note that the underlying probability space in the setup can be partitioned as 

L+l 

e=i 

Tk,e = {ri < A;,--- ,r^_i <A;,r^> A; + l,--- ,r^> + 

The event where no sensor has observed the change is denoted as j. (The test that will be 
proposed and studied later in the paper thresholds the a posteriori probability of j.) On the 
other hand, Tk/ (for i > 2) corresponds to the event where the maximal index of the sensor that 
has observed the change before time instant k is i — 1. Observe that pk,e is the probability of 
Tfc £ conditioned on 1^. 

To show that p^/ can be written in terms of p^-i, the observations and the prior proba- 
bilities, we partition Tk,e further as 

£ 

Tk,e = U Uk,e,j 

Uk,ej = {ri<A;-l,.--,r,_i<A;-l,r, = A;,-.-,r^_i = A;, 
Te>k + l,---,TL>k+l}, l<j<t 

Note that Uk/,j D Tk-ij = Uk/j. Using the new partition {Uk/j, j = 1, • • • , ^} and applying 
B ayes' rule repeatedly, it can be checked that pk/ can be written as 

^ _ Ylm=l f {^k\h-l, Uk,e,m)P{Uk,e,m\h-l) A. -A/^ 

Yl^=l Z)m=l f{^k\Ik-l, Uk,j,m)P{Uk,j,m\h-l) -^j 

where /( I ) denotes the conditional probability density function of Zk and jV^ denotes the 
numerator term. 

From the i.i.d. assumption on the statistics of the observations, the first term within the 
summation for Me can be written as: 

e-i L e-i L 

f {Zk\Ik-i, Uk,i,m) = W fi{Zk,j) Y\_ fo{Zk,j) = Y\. ^^'i n fo(^'^'j) 
j=i j=i j=i j=i 

where Lkj = is the likelihood ratio of the two hypotheses given that Zkj is observed at 

the j-th sensor at the k-th instant. For the second term, observe from the definitions that 

P{Uk/,rn) 



P{Uk,l,m\Ik-l) — P{Tk-l,m\Ik-l) 



P{Tk-l,m) 



Thus, we have 



■A/'^ = ( ■ Pk-l,m ] X n ^fc."* n MZk,m) 




m=l 



where the first part is a weighted sum of Pk-i,m with weights decided by the prior probabiUties, 
and the second part of the evolution equation, ^obs{k, i), can be viewed as that part that depends 
only on the observation Z^. 

Many observations are in order at this stage: 

• The above expansion for jV^ can be easily explained intuitively: If the maximal sensor index 
observing the change by time kisi—1, then the maximal sensor index observing the change 
by time k — 1 should be from the set {0, • • • , ^ — 1}. 

• Using the joint-geometric model for {F^}, it can be shown that Wk,e,jn is of the form: 

Wk,e,m = T = (1 - Pe-i,e) ■ [[ Pj,j+i = (1 - Pe-i,e) ■ 

^ ' j=m—l 

e-1 L / e \ 

^fe = YlLk,mYlMZk,m) ■ {I- Pe-i/) X l^Pk-i,m-w^^\ (2) 

m=l m=l \m=l / 

with the understanding that the product term in the definition of is vacuous (and is 
to be replaced by 1) if m = i. It is important to note that the joint-geometric assumption 
renders the weights (wk,e,m) associated with pk-i,m independent of k. This will be useful 
later in establishing convergence properties for the DP. 

• It is important to note that given a fixed value of £, pk,e is dependent on the entire vector 
Pk-i and not on pk-i,e alone. Thus, the recursion for J\fi implies that Pk forms the sufficient 
statistic and the function J^{Ik) can be written as a function of only pj^, say J^{p^). The 
finite-horizon DP equations can then be rewritten as 

Jt{Pt) = Pt,i 

JkiPk) = min|pfc,i, c(l-pfe,i)+^^(Pfe)} 

with 



Alip,) 4 E[4^,{p,^,)\Ik 



[Jl^,{Pk+i)f{Zk+i\Ik)] 



dz. 

Zk+i=z 



The previously established recursion for Pk+i ensures that the right-hand side is indeed a 
function of Pk- 



• It is easy to check that the general framework reduces to the special case when all the 
change-points coincide with Fi [32]. In this case, only i and Tk^L+i are non-empty sets 
with 

Tk,i = {Ti>k + 1}, and Tk,L+i = {^i < k}, 

Pk,L+i = Pk, Pk,i = l-Pk and pk,e = 0, £ = 2,--- ,L. 

Furthermore, the recursion for reduces to 

AT 

" UUfo(Zk,){l-Pk-i)il-p)+Af 

L 

Af = l[MZk,j){{l-pk^i)p + Pk-i) 
i=i 

which coincides with [32, eqn. (13)-(15)]. This case can also be obtained from the formula 
in © by setting p^_i,^ = 1 for all i with 2 < i < L. 

IV. Structure Of The Optimal Stopping Rule (ropt) 

The goal of this section is to study the structure of the optimal stopping rule, Topt. For this, 
we follow the same outline as in [32], [34] (see, also [33, p. 133] for a similar example) and 
study the infinite-horizon version of the DP problem by letting T — oo. 

Theorem 1: Let p = [pi, ■ ■ ■ ,Pl+i] be an element of the standard L-dimensional simplex V, 
defined as, V = {p : J2j=i Pj — The infinite-horizon cost-to-go for the DP is of the form 

J(p)=min|pi, c(l -pi) + 

where the function Aj(p): i) is concave in p over V; ii) is bounded as < Aj(p) < 1; and iii) 
satisfies Aj{p) = over the hyperplane {p : Pi = O}. 

Proof: Before considering the infinite-horizon DP, we will study the finite-horizon version 
and establish some properties along the directions of [32]-[34]. A straightforward induction 
argument shows that if T is fixed, 

< Jkip) < 1 for all < < T, 
< AKp) < 1 for all < A; < T. 

Similarly, it is easy to observe that for any k, Al{p) and J^{p) equal zero if pi = 0. In 
Appendix [Al the concavity of (■) and Jj(-) are established via a routine induction argument. 

We now consider the infinite-horizon DP and show that it is well-defined. (That is, we remove 
the restriction that the stopping time is finite and let T ^ oo.) Towards this end, we need to 



establish that lim Jj(-) exists, which is done as follows: By an induction argument, we note that 
for any p and T fixed, we have 

Jlip)<Jl+,ip), 0<k<T-l. 

It is important to note that this conclusion critically depends on the joint-geometric assumption 
of the change process (in particular, the memorylessness property that results in the independence 
of Wk/,m on k in ^) and the i.i.d. nature of the observation process conditioned on the change- 
point. 

Using a similar induction approach, observe that for any p and k fixed, J'^^^{p) < J^ip)- 
Heuristically, this can also be seen to be true because the set of stopping times increases with 
T. Since J^{p) > for all k and T, for any fixed k, we can let T ^ oo and we have 

limjj(p)= inf Jj(p)4 Jr(p)- 

1 I : 1 y k 

Furthermore, the memorylessness property and the i.i.d. observation process results in the invari- 
ance of J^{p) on k. This can be shown by a simple time-shift argument. Denote this common 
limit as J{p). 

A simple dominated convergence argument [35] then shows that YimA^{p) is well-defined 
and independent of k. If we denote this limit as Aj{p), we have 

dz 

Z=z 



Mp) = J [J{p)f{z\i,)] 

7=1 m=l 



Z=z 



r- 

where the fact that ^>obs(fc, j)!^-^ independent of k is denoted as $obs(»,j)- Hence, the 
infinite-horizon cost-to-go can be written as 



J(p)=min|pi, c(l -pi) + 



The structure of Aj{jp) follows from the finite-horizon characterization by letting T — oo. ■ 
At this stage, it is a straightforward consequence that the optimal stopping rule is of the form 



Topt = inf |pfc,i(l + c) - c < 



That is, a change is declared when the hyperplane on the left side is exceeded by Aj{jp^ and 
no change is declared, otherwise. 

We will next see that this test characterization reduces to a degenerate one as p — > 0. To 
establish this degeneracy result, along the lines of [32], we now define a one-to-one and invertible 



transformatior@, {q. 



L + 1}, as follows: 



PPk,i 



The inverse transformation is given by: 



Pk,, 



Qk,, 



which is equivalent to 



1 + P Ej=2 

We can write go/ in terms of the priors as 

Po,i ^ 1 
PPo,i p' 

Po/ ^({ri 



and pk,i 



L + 



1 +PL,=2 



L + 



90,1 



90/ 



r,_i = o,r, >o}) 



PPo,i 



pP({ri>o}) 



ni=oPij+i(i 



2,--- + 



P(l-P) 

Note that while p^^i are conditional probabilities of certain events and hence lie in the interval 
[0, 1], the range of qk,i is in general [0, oo). 
It can be checked that the evolution equation can be rewritten in terms of q^^i as 



Qk/ 



1 - Pt-l,l 

1-p 



qk-i,jWj 



(3) 



3=1 \i=i 

It is interesting to note from ([3]) that the update for q^/ is a weighted sum of qk-i,j,j = 1, ■ ■ ■ , ^ 
with progressively increasing weight as j increases. Similarly, we can define Jj(-) and A]^{-] 
in terms of q^. Using the transformation {qk,e}, Topt is seen to have the form: 




When all Ti coincide [32], we have 

Pk A 



qk,L+i — —p- r — g/c, gfc,i — -, qk,i — 0, 

P(l-Pfc) p 



2,--- ,L. 



It is important to note that the transformation in [32] can be generalized in more than one direction. For example, i) 
qh.i = — ^^pvk \ ' ' ^'-^ ^'^'^ ~ ppk 'i consistent with the definition in [32]. While these definitions of qk.e ensure 

that the structure of Topt (as p ^ 0) becomes simple, the recursion for qt.e (and hence, an understanding of the performance 
of the proposed test) becomes more complicated. We believe that the definition of qk,e, as provided here, is the most natural 
generalization in the goal of understanding the performance of change process detection schemes. 



Further, it is straightforward to check that the evolution in ([3]) reduces to 



n 1—1 j 

qk,L+i = ~_ ^ ' ■ (1 + gfc-i,L+i) , (4) 

which is [32, eqn. 32]. Thus, the space of sufficient statistics and the optimal test reduce to a 
one-dimensional variable (p^ = -P({ri < k}^!^) or equivalently, q^) and a threshold test on 
(or equivalently, on g^), respectively. 

In the general case, unless something more is known about the structure of Aj{-) (which is 
possible if there is some structure on {p^-i/}), we cannot say more about Topt. Nevertheless, 
the following theorem establishes its structure in the practical setting of a rare disruption regime 
(p 0). The limiting test thresholds the a posteriori probability that no-change has happened 
(from below), and is denoted as va- 

Theorem 2: The structure of Topt converges to a simple threshold rule in the asymptotic limit 
as p — > 0. This test is of the form: 



Stop if log (j2^^2 (lk,i) > ^ 
Continue if log ^^^^2^ qk,t^ < A 



for an appropriate choice of threshold A. 

Proof: See Appendix |Bl ■ 
The test va is of low-complexity because of the following properties: i) a simple recursion 
formula ([3]) for the sufficient statistics; ii) a threshold operation for stopping; and iii) the threshold 
value that can be pre-computed given the Pfa constraint (see Prop. |3]). However, it is important 
to note that the complexity of ua is not equivalent to that of the threshold test of [32] because 
the recursion for the sufficient statistics depends on a posteriori probabilities, in general, 

in contrast to a single parameter in [32]. 

The fact that Topt ua for an appropriate choice of A does not imply that ua is asymptotically 
(as p ^ or as Pfa ^ 0) optimal. However, the low-complexity of this test, in addition to 
Theorem [2l and the fact that the structure of Aj(qr^) (and hence, Topt) are not known suggest 
that it is a good candidate test for change detection across a sensor array. In fact, we will see 
this to be the case when we establish sufficient conditions under which ua is asymptotically 
optimal. 

V. Main Results on va 

Towards this end, our main interest is in understanding the performance (-Edd vs. Pfa) of ua 
for any general choice of threshold A. We make a few preliminary remarks before providing 
performance bounds for ua- 



Special Cases of Change Parameters: We start by considering some special scenarios of change 
propagation modeling. The first scenario corresponds to the case where one (or more) of the 
Pt-i,i is 1. The following proposition addresses this setting. 

Proposition 1: Consider an L-sensor system described in Sec. HIl parameterized by {pe-i/}, 
where p^/ = 1 for some i' and max pj j+i < 1. This system is equivalent to an (L — l)-sensor 
system, parameterized by {/J^^^+i}, where 

= Pj+i,j+2, j > 

with the (£' + l)-th sensor observing (a combination of) Z^/'+i and Zk/'+2 with a geometric 
delay parameter of /Sp/'+i = pf/+i,f/+2. 

Proof: The proof is straightforward by studying the evolution of {qk/} for the original L- 
sensor system. From (|3]), it can be seen that qu/'+i = (identically) for all k and the reduced (L — 
1) -dimensional system discards this redundant information, while the observation corresponding 
to the {^' + l)-th sensor is carried over to the {^' + 2)-th original sensor. ■ 

The second scenario corresponds to the case where one (or more) of the pi-i/ is 0. 

Proposition 2: Consider an L-sensor system, parameterized by with indicating the 

smallest index such that pe/i+i = 0. This system is equivalent to an i"-sensor system with the 
same parameters as that of the original system. It is as if sensors (£' + 1) and beyond do not 
exist (or contribute) in the context of change detection. 

Proof: The proof is again straightforward by considering the evolution of {qk,e} in © and 
noting that q^j, j > i' + 2 are identically for all k. ■ 

It is useful to interpret Props. [T] and [2] via an "information flow" paradigm. If change propaga- 
tion is instantaneous across a sensor (corresponding to the first case), it is as if the fusion center 
is oblivious to the presence of that sensor conditioned upon the previous sensors' observations. 
In this setting, the detection delay corresponding to that sensor is zero, as would be expected 
from the fact that the geometric parameter is 1. In the second case, information flow to the fusion 
center (concerning change) is cut-off or blocked past the first sensor with a geometric parameter 
of 0. That is, the observations made by sensors {£' + 1, ■ ■ ■ , L} (if any) do not contribute 
information to the fusion center in helping it decide whether the disruption has happened or not. 
Apart from these extreme cases of oblivious/blocking sensors, we can assume without any loss 
in generality that 

< minpi^i i < maxpi^i i < 1. 

Continuity arguments suggest that if some pe-i/ is small (but non-zero), it should be natural to 
expect that the i-th sensor and beyond may not "effectively" contribute any information to the 



fusion center. We will interpret this observation after establishing tractable performance bounds 
for ua- 

Probability of False Alarm: We first show that letting A ^ oo in z/^ corresponds to considering 
the regime where Pfa 0. 

Proposition 3: The probability of false alarm with ua can be upper bounded as 

1 



Pfa< 



1 + p ■ exp(A) 

That is, if a < 1 and the threshold A is set as A = log ^^j, then Pfa < «• 

Proof: The proof is elementary and follows the same argument as in [29], [36]. Note that 
Pfc 1 and ua can also be written as 

Pk,i = P{{T^>k}\h) 
va = inf \ Pk,i < 



1 + p ■ exp(A) j 



Thus, we have 

1 



Universal Lower Bound on Edd' We now establish a lower bound on Pdd for the class of 
stopping times A^. That is, any stopping time r should have an Edd larger than the lower 
bound if Pra is to be smaller than a. 

Proposition 4: Consider the class of stopping times = {r : Pfa(t) < a}. Under the 
assumption that min pi^i ^ > 0, as a — > 0, we have 

1=2,-, L 



inf i?DD(r) > ^ 



re A. ' - LD{hJ,) + \\og{l-p)[ 

Proof: The proof follows on similar lines as [29, Lemma 1 and Theorem 1], but with some 
modifications to accommodate the change process setup. See Appendix O ■ 
Upper Bound on E^d of ua- We will establish an upper bound on Eqd of ua- Using this bound, 
it can be seen that ua meets the lower bound (proved above) for an appropriate choice of A, 
thus establishing its asymptotic optimality. The main result is as follows. 

Theorem 3: Let ipi-i e} be such that < minp£_i^ < maxpi^n < 1. Further, assume that 
D(fi, /o) be such that there exists some j satisfying H. < j < L and 



Difi, fo) > -^f-r log I ^ ) , (5) 



for all 2 < £ < L. Then, i^a with A = log 



pa 



is asymptotically optimal (as a 



0). 



Furthermore, the performance of ua in this regime is of the form: 



El 



DD 



log (i) + I log(PFA) 



LD(/i,/o) + |log(l-p) 



+ 0(11 



The proof of Theorem [3] in the general case of an arbitrary number (L) of sensors with an 
arbitrary choice of {pi^i/} results in cumbersome analysis. Hence, it is worthwhile considering 
the special case of two sensors that can be captured by just two change parameters: p and pi,2. 
The main idea that is necessary in tackling the general case is easily exposed in the L = 2 
setting in Sec. |VIl The general case is subsequently studied in Sec. IVIII 

VI. Expected Detection Delay: Special Case (L = 2) 
The main statement in the L = 2 case is the following result. 

Theorem 3 (L = 2): The stopping time ua is such that z/^i ^ 00 as A — > 00. Further, if 
D{fi,fo) satisfies 

D(/i,/o)>log(2-p-pi,2), 

as A — > cx), we also have 

■ 

We will work our way to the proof of the above statement by establishing some initial results. 



Proposition 5: If < {p, pi,2} < 1, we can recast {qk/} as follows: 
1 



Qk,! 



Qk,2 



Qk,3 



P 



1,2 



1-P 
1-p 



1 + 



1 - Pi,; 
1-p 



k-2 



m,2) 



1-p 



y m=l 



m=0 



Ci 



J2 



(1 — Pi, 2) ■ (1 + <lm,2) ■ Lm+1,1 

L'm,lL'm,2 



[i-pY 



k-2 



m=l 



m=0 



"fc,3 



Pl,2 ■ (^1 - P + (1 - Pl,2) • ■ (1 + gm, 

-^m+l,l-^m+l,2 ■ (Pl,2 + Pl,2 Q'm,2 + ^m.s) 



C1C2 
2 



^3 



Proof: We start with the recursions 

qk,3 = ^l^^p^ ' + Pl,2 gfe-1,2 + ?fe-l,3) ■ 

The expression for 2 is obtained by isolating the term {l + qk-j,2) at every stage as j increases 
from 2 to k. The expression for qk,3 is obtained by isolating the term (pi^2 + Pi,2 qk-j,2 + Qk-j,^) 
at every stage as j increases. ■ 
The test ua can now be rewritten as 

J^A = inf I log {qk,2 + qk,3) > Aj 

= inf I log {ak,2 -Ci- J2 + Ofc.s • C1C2 ■ J3) > ^ j 

= inf |log(afe,2-Ci- J2)+logfl + C2-^-^) >A 

I V '^k,2 J2J 

We need the following preliminaries in the course of our analysis. 
Lemma 1: Since qm,2 > 0, note that J2 can be trivially upper bounded as 

k-l / ^ 

J, < ttA , l-P 



m=l 



(1 - pi,2) • -Lm,! 



Lemma 2: If {x, xi, X2, • • • } are i.i.d. with x >0 and £'[log(x)] > 0, then 



\ m=l / 



^ ^ ^ a.s. and m mean. 



k 

If {x, xi, X2, • • • } are i.i.d. with x >0 and £^[log(x)] < 0, then 



a.s. and in mean. 



Note that both these conclusions are true even if {xm} are not i.i.d. (or even independent) as 
long as the condition on the sign of £^[log(a;)] can be replaced with an almost sure (and in mean) 
statement on the sign of lim - =1 log(a;^) (or an appropriate variant thereof). ■ 
The following statement, commonly referred to as the Blackwell's elementary renewal theo- 
rem [35, pp. 204-205], is needed in our proofs. 

Lemma 3: Let Xm be i.i.d. positive random variables and define as follows: 

Tm = Tm-i + Xm, m > 1 and Tq = 0. 



The number of renewals in [0,t] is Nt = inf |Tfe > t|. Then, we have 



t 

E[Nt] 



— a.s. as t — cxD and 
1 

— as t ^ CO, 
/i 



where ^ = -^[xm] G (0, oo]. ■ 
Proof of Theorem |3] (L = 2): We will postpone the proof of the first statement to Sec. IVIII 
when we consider the general case in Prop. [H For the second statement, we first use the bound 
for J2 from Lemma [T] and the fact that (^rn,e > 0, and thus we have 



log 1 + Q 



2 ■ 



^\ > log 1 + a 



2 ■ 



ak,3 
ak,2 



nt^i ( 1 + 



(1-Pl,2)im, 



> l0£ 



Pi, 2 ■ -^m,2 



7Tt=l 



Now, observe that 



log 



^m,2 



:i-Pi,2)- (^1 + 

= D(/i,/o)+log 
>D(/i,/o)+log 



1-p 



(1-Pl,2)im,l 

1 

1 - Pl,2 
1 



log 1 



1-p 



- log 1 + E 



,1 ^ Pl,2)Lm,l 
,1 ~ Pl,2)Lm,l 



.1 - Pl,2, 

= D(/i,/o)-log(2-p-pi,2) >0 

where the first equality follows since pi 2 > (change has to eventually happen at the second 
sensor to ensure that i?[log(Lm,2)] = /o)), the second step follows from Jensen's inequality 

= 1. Using this fact in conjunction with 



and the third equality from the fact that Ef^ 
Lemma [2] and noting that pi_2 > 0, as k 00, we have 

log{ak,2■C^■J2)+log(l + C2■^■^] > log (C1C2 • «m • J3) 

V "fc,2 J2J 



m=l 



1/k J J 
Pi, 2 ■ ^m,l ■ ^m,,2 

1-P 



The above relationship implies that ua < ^l,a where 

vla = inf j-Lfc > a|. 



Applying Lemma [3] (since the entries in the definition of ul^a are independent) and the first 
statement of the theorem that z/^ — > oo as A ^ oo, we have 

E[ua] ^ E[ul^a] A^oo 1 

A - A ^ 2D(/i,/o) + |log(l-p)r 



VII. Expected Detection Delay: General Case (L > 3) 

We now consider the general case where L > 3. The main statement here is as follows. 
Theorem 3 (L > 3): If D{fi, fo) is such that the condition (O is satisfied, as A — * oo, we 
have 

^DD = E[ua] < ^ 



LD(/i,/o) + iiog(i-p)r 

■ 

As before, we will work towards the proof of this statement. For this, the following general- 
izations of Prop. [5] and Lemma \T\ are necessary. 
Proposition 6: We have 

e-l k k-2 

qk,i = Oik/ ■ JJ JJ Lm,j ■ n + ^ = 2, ■ ■ ■ , ^ + 1 where 




Cm,e 



1 2^7=1 1m,j '^j ^m+l,j,i 



1-p 



V 



Bm,n,i — (1 — Pp,p+l) ■ Y\. n — 1, ■ ■ ■ , i 

p=n—l j=l 

e-l 

Cm,n,t = Bm,n,l ^ (1 ~ Pi-l,e) ' Y\, ^ = 1, - ■ ■ ,i. 

Proof: The proof is provided in Appendix |D] for the sake of completeness. Also, see 
Appendix iDl for how this proposition can be reduced to the case of [32]. ■ 
Lemma 4: The following upper bound for (rn/ is obvious when maxp£_i £ < 1: 

s - — 



-1/) ■ Lm+l,j (1 - Pe-1/) ■ Lm+l,j 



From Prop. [6l can be conveniently rewritten as 



ua = inf <^ log 

k 



akP ■ Ci- ■ -Ci^i ■ JA > a 



Unlike the setting in Sec. |VIl the structure of ua (as of now) is not amenable to studying Edd 
(in further detail). This is because it has the form of log of sum of random variables (see [36] 
for similar difficulties in the multi-hypothesis testing problem). We alleviate this difficulty by 
rewriting the test statistic in terms of quantities whose asymptotics can be easily studied. 

Proposition 7: We have the following expansion for the test statistic: 



/L+l 

log I ^ ak,e ■ Ci ■ ■ • Ci-i ■ Ji 



log (afc,2 ■ Ci ■ J2) + XI (1 + 



Oiki ■ Je 



log 



1 - Pl,2 \ 2-p-pi^2 
1-p 



Ci ■ J2 



+ 5^ log (l + Vi-f3k,i-Ce-^) 



where 



kl 



Ve+1 



Oik,i+l 

aki 



1 - 



7+1 



Vi-Pk,i-C^-^ 



Je 



l + Vi-Pk,i-C,-^ 



Pe-i,e ■ U + 



2,- 



1 — Pe,e+i 



E 



1 



£-1 ■ 

m=0 



Pm,m+1 



with r]2 = I. 

Proof: The proof is straightforward by using the induction principle. ■ 
The following proposition establishes the general asymptotic trend of ua. 
Proposition 8: The test va is such that ua 00 a.s. as A — > 00. 

Proof: See Appendix |Dl ■ 
As we try to understand ua further, it is important to note that the behavior of the decision 
statistic of va is determined (only) by the trends of 

Ji+i 



Jf 



This is so because the asymptotics of {r]i} are also primarily determined by the trends of {xi}. 
We now develop the generalized version of the heuristic in Sec. |Vl]for the upper bound of -Edd- 



Consider the case where L = 4. The second piece in the description of the test statistic (in 
Prop. |7]) can be written as 

L = log (1 + r]2X2) + log (1 + 773X3) + log (1 + 774X4) 

where the evolution of r]i and xi, £ = 2, 3, 4 is described in Prop. |7J In the regime where k ^ 00, 
note that if X2 ^ 00 (with high probability), then 7/3 ^ 1. On the other hand, if X2 ^ (with 
high probability), then 773 X2- Thus, we can identify (and partition) eight cases as follows: 



Case 1 : X2 ^ 0, X2X3 0, X2X3X4 
Case 2 : X2 — ^ 0, X2X3 — > 0, X2X3X4 - 
Case 3 : X2 — > 0, X2X3 00, X4 



Case 4 : 
Case 5 : 
Case 6 : 
Case 7 : 
Case 8 : 



X2 - 

X2 - 
X2 - 

X2 
X2 - 



0, X2X3 - 
> 00, X3 - 

00, X3 - 
-> 00, X3 
00, X3 - 



-> 00, X4 - 
> 0, X3X4 
0, X3X4 - 
00, X4 
00, X4 - 



y = 
00 

> = 
00 

> = 
00 

> = 
00 



m - 
m - 

^ V3 

V3 - 

^ V3 

m - 

^ V3 



X2, 7/4 - 

> X2, ?74 
X2, V4 - 

> X2, r/4 

1, - 
^ 1, Vi- 



0:2X3 = 
X2X3 

1 =^ 

X3 =^ 
> X3 =^ 



L 
C 
C 
C 
C 
C 
C 
C 





log(x2X3X4) 

l0g(x2X3) 

log(x2X3X4) 

log(x2) 

log(x2X3X4) 

log(x2X3) 

log(x2X3X4) 



In all the eight cases, we have a universal description for £ (as /c — 00) that holds with high 
probability: 



C ~ log (xm) , t = arg min < JJ^ Xm for all j > 

m=2 ~ ~ L m=£ 

If i* = 2, then the above summation is replaced by 0, and if there exists no £ G {2, 3, 4} such 
that the above condition holds, then i* is set to 5. 

The following proposition provides a precise mathematical formulation of the above heuristic. 



Proposition 9: Let the following limit be well-defined and be denoted as 7^ ., : 



Define i* as 



7f 



,4 lim i y log f 

m=l ^ ' 



/)* A 



arg mm 

e:2<e< L 



I A^j < for all j 



^, ■ ■ ■ , l| where 



log ( l)Dif,, /o) + 7.,. 

J- — Pe-i,£ 



(6) 



If there exists no element in the set for the argmin operation in we set 
as A — > oo (and hence, k = ua oo a.s. from Prop. [8]), we have 

1 ^ 1^*"^ 

i=2 1=2 

If = 2, then the second term in the above expression is set to 0. 
Proof: See Appendix |Dl 
Following Props. [8] and [9l as A — > oo, z/^ can be restated as 



L + 1. Then, 



(7) 



inf I 5^ f log (^Y^) + + + ^--2) + 1 E ) > ^ 



inf <^ ^log 



m=l 



1 - P^*-l.^* 
1-p 



+ log(L^j) + log (1 + Cm/*) > ^ 



(8) 



with t defined in ©. 

Observe that if the condition in Prop. [9] is satisfied, the first I* — 1 sensors contribute to the 
slope of Eqq and the rest of the sensors , L (if any) do not contribute to the slope. It is 

useful to understand the conditions under which i* = L + 1. 

Theorem [3] provides a simple condition such that the observations from all the L sensors 
contribute to the slope. We are now prepared to prove it. 

Proof of Theorem > 3); First, using Lemma |4] note that, we can bound A^j as 



Ae, >{j-i+ l)D{f,, /o) + log(l - p,- - E 



'e-i 



log E 



(1 - Pp. 



p+l) 



Using Jensen's inequality and noting that Ej-^ 



nj=p+i 



1, ([5]) is sufficient to ensure that 



for all £ = 2, ■ ■ ■ , L, there exists some j > £ such that A^j > 0. It is important to realize that 
the above condition is necessary as well as sufficient for = L + 1. Thus, under the assumption 
that ^ holds, invoking Prop. [8] as A ^ oo (that is, letting k = ua oo a.s. and using Prop. H]), 
ua can be written as 



inf (j2 log(^™/) + log (y^) + log(l + Cm,L+i) j > a| 



Note that since Cm,L+i > 0, we have 



J2 (j2 iog(^™,^) + log (y^) + i°g(i + Cm,L+i) ) >J2\Y1 iog(^™/) + log 

m=l \e=i VP/ / \^=1 ^ 



Lk 



and hence, ua < i^l,a where 

ul,a = inf j-Lfe > a|. 



Thus, we have 



E[ua] ^ E[ul^a] A^po 1 



^ ^ LD(/i,/o)+log(Y^ 

where the convergence is again due to Lemma [3l ■ 

VIII. Discussion and Numerical Results 

Discussion: A loose sufficient condition for all the L sensors to contribute to the slope of Edq 
of ua is that 

D{fi, fo) > max mm ■ log = 7„. 

i=i,- ,L-i j>i+i J - i y 1 - pjj+i ' 

Another sufficient condition is that 

D{fi, fo) > ^^Y^^^-i ■ log ^1 - P + J]^^ ~ ^JJ+i) 

That is, if p is such that 



L 

P > ^{l- Pe-1/), 

1=2 

then 7u < and the condition of Theorem [3] reduces to a mild one that the K-L divergence 
between /i and /o be positive. A special setting where the above condition is true (irrespective 
of the rarity of the disruption-point) is the regime where change propagates across the sensor 
array "quickly." The case of [32] is an extreme example of this regime and Theorem [3] recaptures 
this extreme case. 

In more general regimes where change propagates across the sensor array "slowly", either the 
disruption-point should become less rare (independent of the choice of /i and /q) or that the 
densities fi and /o be sufficiently discernible (independent of the rarity of the disruption-point) 
so that all the L sensors can contribute to the asymptotic slope. When these conditions fail to 
hold, it is not clear whether the theorems are applicable, or even if all the L sensors contribute 
to the slope of -E[z/a]- Nevertheless, it is reasonable to conjecture that as long as mjnp£_i^^ > 0, 
then all the L sensors contribute to the asymptotic slope. 

However, the difference between the asymptotic and the non-asymptotic regimes need a careful 
revisit. Following the initial remark (Prop. O on the extreme case of blocking sensors (where 
some pi-i/ = 0), in the more realistic case where some pi-i/ may be small (but non-zero). 



it is possible that if -D(/i, /o) is smaller than some threshold value (determined by the change 
propagation parameters), not all of the L sensors may "effectively" contribute to the slope of 
Eqo, at least for reasonably small, but non- asymptotic values of Pfa- For example, see the 
ensuing discussion where numerical results illustrate this behavior at Pfa values of 10^'' to 10^^ 
for some choice of change propagation parameters, even when the condition in Theorem |3] is 
met. When the condition in Theorem |3] is not met, such a behavior is expected to be more 
typical. 

The final comment is on the approach pursued in this paper. While the approach pursued in 
Sec. rvTland lVIII results in interesting conclusions, it is not clear if this approach h fundamental in 
the sense that this is the only approach possible for characterizing E^d vs. Pfa- Furthermore, this 
approach assumes the existence of {7£,j}. Even if these quantities exist and are hence, theoreti- 
cally computable, such a computation is complicated by the fact that {Cm/, m = 1, ■ ■ ■ , A;} are 
correlated. Thus, verification of the exact condition in Prop. |9] (equivalently, computing £*) has 
to be achieved either via Monte Carlo methods or by bounding j, as done here. Furthermore, 
correlation of {Cm/} and hence, Um (see ([8])) implies that statistics of ua have to be obtained 
using non-linear renewal theoretic techniques for general (correlated) random variables [37]. This 
is the subject of current work. 

Numerical Study I - Performance Improvement with ua: Given that the structure of ropt is not 

known in closed-form, we now present numerical studies to show that ua results in substantial 
improvement in performance over both a single sensor test (which uses the observations only from 
the first sensor and ignores the other sensor observations) and a test that uses the observations 
from all the sensors but under a mismatched model (where the change-point for all the sensors 
is assumed to be the same), even under realistic modeling assumptions. 

The first example corresponds to a two sensor system where the occurrence of change is 
modeled as a geometric random variable with parameter p = 0.001. Change propagates from 
the first sensor to the second with the geometric parameter pi 2 = 0.1. The pre- and post-change 
densities are CA/'(0, 1) and CA/'(1, 1), respectively so that D{fi, fo) = 0.50. Fig. [2] shows that 
ua can result in an improvement of at least 4 units of delay at even marginally large Pfa values 
on the order of 10^^. 

The second example corresponds to a five sensor system where p = 0.005. Change propagates 
across the array according to the following model: pi 2 = 0.1,p2,3 = 0.2, ps 4 = 0.5 and p4,5 = 
0.7. The pre- and the post-change densities are CA/'(0, 1) and CAf{0.75, 1) so that D{fi, fo) ^ 
0.2813. With -D(/i, fo) and the change parameters as above. Theorem [3] assures us that at least 
L = 2 sensors contribute to the Eqo vs. Pfa slope asymptotically. On the other hand. Fig. |3] 
shows that more than two sensors indeed contribute to the slope. Thus, it can be seen that 



p = 0.001,p^2 = 0.1,L = 2 




Fig. 2. False alarm vs. Expected detection delay for a L = 2 setting with p — 0.001 and pi, 2 ~ 0.1. 



L = 5 sensor case 




Fig. 3. False alarm vs. Expected detection delay for a typical L — 5 setting. 



Theorem [3] provides only a sufficient condition on performance bounds. It is also worth noting 



the transition in slope (unlike the case in [32]) for both the mismatched test and ua as Pfa 
decreases from moderately large values to zero, whereas the slope of the single sensor test (as 
expected) remains constant. 

Numerical Study II - Performance Gap Between the Tests: We now present a second case- 
study with the main goal being the understanding of the relative performance of ua with respect 
to the single sensor and the mismatched tests. We again consider a L = 2 sensor system and 
we vary the change process parameters, p and pi 2, in this study. The pre- and the post-change 
densities are CAf{0, 1) and CAf{1.2, 1) so that D{fi, /o) = 0.72. 




(c) (d) 

Fig. 4. False alarm vs. Expected detection delay for a L = 2 setting with different model parameters. 



Fig. |4] and Fig. [5tb) show the performance of the three tests with varying p parameters for a 



fixed choice of pi_2. We observe tliat tlie gap in performance between the single sensor test and 
ua increases as p decreases, whereas the gap between ua and the mismatched test stays fairly 
constant. Similarly, Fig. \5\ shows the performance of the three tests with varying pi 2 parameters 
for a fixed choice of p. We observe from these plots that the gap between the mismatched test and 
ua increases as pi 2 decreases, whereas the gap between the single sensor test and ua increases 
as pi 2 increases. 
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(c) (d) 

Fig. 5. False alarm vs. Expected detection delay for a L = 2 setting with different model parameters. 



The choice of D(fi, /o) = 0.72 is such that the sufficient condition in Theorem[3]are satisfied, 
independent of the change parameters. Hence, we expect the slope of the Edd vs. Pfa plot to be 
of the form 2£)(/i /o)+| iog(i~p)| (asymptotically as Pfa 0. Nevertheless, Fig. [5tc) and (d) show 



that, when both p and pi 2 are small, the slope of ua is only as good as (or slightly better than) 
the single sensor test, which is known to have a slope of the form j^,. . — ri — tt- Thus, we 

' ^ -D(/i,/o)+l log(l-p)l ' 

see that even though our theory guarantees that both the sensors' observations contribute in the 
eventual performance of ua asymptotically, we may not see this behavior for reasonable choices 
of PpA like 10""^. The case of observation models not meeting the conditions of Theorem |3] is 
expected to show this trend for even lower PpA values. 

To summarize these observations, if Eod, v^, -E'dd, mm and Eod, ss denote the expected detec- 
tion delays for z/^, mismatched and single sensor tests (respectively) for some fixed choice of 
Pfa, then 

-£"□0, MM — -E'dd, ~~ ^"^^ independent of p 

Pl,2 

-E-DD, SS — -E-DD, !/A °^ 

P 

It is interesting to note from the above equations that pi 2 impacts the gap between the two tests 
in a contrasting way. The test va is expected to result in significant performance improvement 
in the regime where p is small, but pi 2 is neither too small nor too large. In fact, this regime 
where ua is expected to result in significant performance improvement is the precise regime 
that is of importance in practical contexts. This is so because we can expect the occurrence of 
disruption (e.g., cracks in bridges, intrusions in networks, onset of epidemics etc.) to be a rare 
phenomenon. Once the disruption occurs, we expect change to propagate across the sensor array 
fairly quickly due to the geographical (network proximity in the case of computer networks) 
proximity of the other sensors, but not so quick that the extreme case of [32] is applicable. 
Classifying the regime of {pe^i/} and D(fi, /o) where significant performance improvement is 
possible with ua is ongoing work. It is also of interest to come up with better test structures in 
the regime where ua does not lead to a significant performance improvement. 

IX. Concluding Remarks 

We considered the centralized, Bayesian version of the change process detection problem in 
this work and posed it in the classical POMDP framework. This formulation of the change 
detection problem allows us to establish the sufficient statistics for the DP under study and a 
recursion for the sufficient statistics. While we obtain the broad structure of the optimal stopping 
rule (Topt), any further insights into it are rendered infeasible by the complicated nature of the 
infinite-horizon cost-to-go function. Nevertheless, Topt reduces to a threshold rule (denoted in 
this work as ua) in the rare disruption regime. The test ua possesses many attractive properties: 
i) it is of low-complexity; ii) it is asymptotically optimal in the vanishing false alarm probability 



regime under certain mild assumptions on the K-L divergence between the post- and the pre- 
change densities; and iii) numerical studies suggest that it can lead to substantially improved 
performance over naive tests. Thus, serves as an attractive test for practical applications that 
can be modeled as a change process. 

To the best of our knowledge, this is the first work to consider the change process detection 
problem in extensive detail. Thus, there exists potential for extending this work in multiple 
new directions. While we established the asymptotic optimality of ua when D(fi, fo) > 7^, 
it is unclear as to what happens when D(fi, fo) < 7^. In other words, is i* = L + 1 when 
D{fi,fo) < lu given that 7„ > 0? It is most likely that va is asymptotically optimal even in 
this regime as long as mm p^_i/ > 0, but establishing this result may involve some ingenious 
techniques. However, if va is not asymptotically optimal in this regime, it is of interest to design 
better low-complexity stopping rules; e.g.. Threshold tests on weighted sums of the a posteriori 
probabilities based on further study of the structure of Topt etc. 

More careful asymptotic analysis of va and performance gap between: i) va and the mis- 
matched test, ii) Va and the single sensor test, and iii) va and weighted threshold tests etc. 
would involve tools from non-linear renewal theory [26], [29], [37] and is the subject of current 
attention. Such an asymptotic study could in turn drive the design of better test structures. Our 
numerical results also illustrate and motivate the need for non-asymptotic characterization (piece- 
wise linear approximations of the -Edd vs. Pfa curve) of the proposed tests. Unlike the case of 
instantaneous change propagation [29], [32], we showed that asymptotic characterizations may 
not kick in quickly for small Pfa values if the change propagates too "slowly" across the sensor 
array. Under such circumstances, it is also of interest to revisit the precise definition of optimality 
of a stopping rule. 

Decentralized [32], [34], censored [38], multi-channel [18] and robust [39], [40] versions of 
change detection are motivated by these constraints. Extensions of this work to more general 
observation models are important in the context of practical applications. For example, non- 
iid [29] and Hidden-Markov models [24] have found increased interest in biological problems 
determined by an event-driven potential [6], [7]. Practical applications will in turn drive the need 
for understanding change detection with certain specific observation models. 

Appendix 

A. Completing Proof of Theorem^ Establishing Concavity of Al{-) and Jj(-) 

We now show that A^Pf,) and J^{Pk) are concave in p^. First, note that J^{pj,) = pt,i is 
concave in prp because it is affine. Using the recursion for p^, it is straightforward to check that 

At_i{pt-i) = E[Jt{pt)\It-i] = Pt-i,i ■ (1 - p)- 



Using this in the definition of Jt-i{Pt-i)' we have 
Jt-i{Pt-i) — 



c + pT-i,i(l-p-c) ^<Pt-i,i<1- 

Since both Alj._^{prp_^) and J'f_i{px^i) are affine and piecewise-affine (It is important to note 
that the slope of the second affine part, which is 1 — p — c, is smaller than the first (= 1).) in 
Pt-1 1 respectively, they are concave. 

We now assume that Jt+iiPk+i) concave in p^^i and show that A]^{pi^) is also concave 
in p^. For this, consider XA^pl) + (1 — X)Al{pl) with pi and pi being two elements in the 
standard L-dimensional simplex. We have 



^Al{pl) + {l-X)Al{pl) = j 

-I 



XJk+i (pI+i) /^i + (1 - A) (p^+i) /X2 

fiJi^+, (pUi) + (1 - ^')Jk+l {pI+i) 



•Zfe+i— 2 



dz 



X (A/xi + (1 - A)/X2 



Zfc+i=z 



dz 



where 

IJ'i 



f{Zk+i\h) 



E 



km=l 



, i = 1, 2, and 



A/xi + (1 - A)/X2 

Using the concavity of J^i(-), we can upper bound the above as follows: 

A^Kp^) + (1 - X)Al{pl) < J [ Jfe^+i (/xpUi + (1 - 

X ( A/ii + (1 - A)/i2 ) c^z 

V /J Zfc+l=Z 

If we define 

pI = >^pI + (1 - a)pL 

it is straightforward to check that 

pI+1 = /^Pfc+i + (1 - 

Using these facts, we have 

A^r(p^) + (1 - X)Al{pl) < AliXpl + (1 - X)pI), 

thus establishing the concavity of A^(-). The concavity of Jj( ) follows since the minimum and 
sum of concave functions is concave. An inductive argument completes the proof. ■ 



B. Proof of Theorem |2] 
We will show that 



Stop 



'opt 



< 



Continue if ^.^^ qk,j _ ^ 



for an appropriately chosen function h{p) that satisfies lim h(p) = 0. We start with the finite- 
horizon DP and define and ^'fc as follows: 



1 



■JkiQk 



0< k<T~l. 



The main idea behind the proof is to show that and ^'fe are bounded by a function of p (that 
goes to as p ^ 0), uniformly for all k. Thus, the structure of the test in the limit as p ^ 
can be obtained. 

Towards this goal, note from Appendix lAl that = i = 0. Also, note that J'^-i{q.t-i) 
can be written as 



T-l\Q.T-l) 



I+pESSt-i,, 



E 



jj=2 

L+l ^ 
j=2 (IT-IJ > 



which can be equivalently written as 



T-l 



.j=2 



1 +PEi=2 <1T-1,3 




Note that < i < P and we have 

1 -cEj=2 ^T-lj 



< ^[$T-i|/t-2] 

^L + l 



. 1 + PEj=2 



*T-2 = Pfi'2(p) where 

L+l 

i=2 




iT-2 



Now observe that X„ can be rewritten as 



1 -cEj-2 

1 + P Ej=2 IT-IJ 



1 



c + p 



Furthermore, Xp < 1 for all p and the set within the indicator function (above) converges to the 
empty set as p J, 0. Thus, a straightforward consequence of the bounded convergence theorem 



for conditional expectation [35] is that 



Iim5f2(p) = 

pj.0 



T-2 piO 



0, 

independent of the choice of T. 

Plugging the above relation in the expression for Jt-2{Qt-2)^ we have 



Jt-2{Qt-2) = mill 



1 



1 , v^-L+l 



1 + P Z2j=2 <lT-2,j 1 + P 22j=2 (lT-2,j 



mm 



T-2 



L+1 

j=2 ?T-2J 



1 + 



l + PEt=2 ?T-2,. 



l + PEf=2 ?T-2 



- $ 



T-2 



T-2 — 



1 + PE,=2 (lT-2,j 



^j=2 

l-cEj=2 gr-2j 

l + PEt=2 ?^-2J 



•L+1 ^ 1 

II <:5^gT-2j< 
.i=2 



*7 



c 1 + 



+ ^2(p) 



1 



PT-2,1 > 



c- P92{p) 
c + p 



with < $T-2 < p(l + fl'2(p))- As before, it is straightforward to check that the set within the 
indicator function converges to the empty set as p J, and we can write vE'r-a as 



*T-3 
93{P) 



E[^t-2\It-3] = PQsip) 

^L+l 



E 



1 -cEi=2 <lT-2,j 



1 +PE,=2 (lT-2,j 



+ ^2(P) ■ 1 <PT-2,1 > 



c- pg2{p) 



'r-3 



with 



limg'3(p) = and — ^ 0. 
Following the same logic inductively, it can be checked that 

^e^o, i<k<T, 



independent of the choice of T. That is, we have 



1+pEj=2 <lk,j 



Thus, the test structure reduces to stopping when 

L+1 



j=2 



1 1-^ 

c ■ 1+*^ 



and using the limiting form for as p ^ 0, we have the threshold structure (as stated). The 
proof is complete by going from the finite-horizon DP to the infinite-horizon version as in the 
proof of Theorem [T] Note that while we expect the limiting test structure in the finite-horizon 
setting to be dependent on T, it is not seen to be the case in this work because p = is a 
discontinuity point for the DP. ■ 

C. Proof of Proposition |?] 

We first intend to show that a version of [29, Lemma 1] holds in our case. More precisely, 
our goal is to show that for any e G (0, 1), we have 

lim sup Pk({k < T < k + {1 - e)L„}) = 0, 

where Pk{{-}) denotes the probability measure when Ti = k and 

log (^) 
LZ}(A,/o) + |log(l-p)r 
Note that — > oo as a 0. Following along the logic of the proof of [29, Lemma 1] here, it 
can be seen that 

Pk{{k<T <k+{l-e)L^}) < exp((l-e2)gL„)Poo({A;<r< A; + (l-e)Lj) 

+ Pfc({ max Z,V>(l-e2)gL4), (9) 

0<n<(l-e)Lc, 

where q = LD{fi, /q), Poo{{-}) denotes the probability measure when no change happens, and 

L h-\-7i / £ / 

2^« = EE'-'^'* ' 



il 



£=1 i=ri 
with Ti = k. 

For the first term in we have the following. With the appropriate definitions of q and La, 
and the tail probability distribution of a geometric random variable, it is again easy to check (as 
in the proof of Lemma 1) that for any r G A^,, we have 

exp ((1 - e'^)qLa) Poo{{k < t < k + {1 - e)L„}) ^ as a ^ 

for any e G (0, 1) and all /c > L For the second term in we need a condition analogous 
to [29, eqn. (3.2)]: 



1 



Pk{<— max > (1 + e)g H for all e > and k > 1. 



This is trivial since the following is true: 

^-^LD{f„fo) as n^oo (10) 
n 

for all k e [1, oo). 

The above condition follows from the following series of steps. First, note that the strong law 
of large numbers for i.i.d. random variables implies that 

^ + -EEl°gf7#4) LD{f, J,) = q as n ^ oo. 



=2 j=ri 



Then, it can be easily checked that 

Since minp£_i^^ > from the statement of the proposition, we have E[z(] G (0, oo) for all 
£ = 2, ■ ■ ■ , L, and hence, the condition in (flOl) holds. Applying the condition in [10] with M = 
(1 — e)La as a ^ 0, we have the equivalent of [29, Lemma 1]. 

The proposition follows by application of an equivalent version of [29, Theorem 1, eqn. (3.14)] 
which follows exactly as in [29]. ■ 



D. Completing Proofs of Statements in Sec. \VII\ 

Proof of Prop. ^ We start from ([3]) and apply the recursion relationship for {qk-i.e}- Noting 
that wl-^Wj = for all j such that m < j < £, we can collect the contributions of different 
terms and write X]^=i Q'fc-i j 

t 1 ^ 

3=1 ^ j=l 

where {-Bfc-ij/} is as defined in the statement of the proposition. Thus, we have 

(Ik^iJ = ^— 2^ qk-2,j ■{!+ Ck-2A 

j=i P \j=i / 

. _ 1 2^j=i(lk-2,jWjUk-i,j,e 

s,fc-2/ — — — zjTZTT ■ } • 

(1 - p£_i,^) 1 [.^^ Lk-i,j 2^j-=i gfc-2,i w) 

Iterating the above equation, we have the conclusion in the statement of the proposition. 



It is useful to reduce Prop. [6] to the case of [32] when pi-i/ = 1 for all £ = 2, ■ ■ ■ , L. For 
this, note that ak/ (and hence, qk/) are identically zero for all 2 < £ < L. Thus, we have 

L k fe-2 

qk,L+l = Oik,L+l ■ W W Lm,j • n (1 + Cm,L+l) • 
jr=l m=l m=0 

We then have the following reductions: 

ak,L+i = ITT • 1 



Cm, 



L+l 



1 Bm+l,l,L+l 



]\j=l Lm+l,j 1 + (lm,L+l 

Bm+i,i,L+i = 1 - P and hence, 

[L T k-l ( , T-ri 



Qk,L+l 



Ili=l^k,j T-r j 1 , n,=l-^m,j 



Hi 



1 - P Jrio I ^ + gm-l,L+l 1 - P 



with the initial condition that q-i,L+i = and Lqj = 1 for all j. It is straightforward to establish 
via induction that the only way in which the above recursion can hold is if qk,L+i satisfies 

_ Ilj=iLk,j 

Qk,L+l — 1 — p — ' ^ Qk-l,L+l) 

which, as expected, is the same recursion as dH). ■ 
Proof of Prop. ^ First, note that if we can find {U^} such that for all k 

/L+l \ 

log ^ ak/ ■ Ci ■ ■ ■ C^_i ■ Jij < Uk, 

then > i^u,A where 

uu^A = inf > a|. 
We use Lemma |4] to obtain the following bound and the associated {Uk}'- 

Oik/ ■ Oi • • ■ ■ Ji S / , ; • ; 

I — p 1 — p 

e=2 £=2 ^ m=l ^ 



< 



/L+l £-1 \ _ 



Zlp=o(l ~ Pp,P+l) 11^=1 



\£=2 j=l / m=l ^ 



- T^p'[^ i-p -ll^^'^j-ll T^p 

^ \p=l ^ j=l / m=l ^ 

^ -D Zlp=o(l ^ Pp,P+l) 11^=1 



n 



m=l 



where Dp = Ff! i Pi i+i ■ i n ^ f^'^^^ ] , D = 1 + max - — - — . With the above bound, we 

t 1 ij=l rjj-i-J- \^A^j=u i-p y ... i--pe.e+i 



have 



>inf i ^log 



[ Ep=0 (1 -Pp,P+l) n ^mjA 



m=l 



1-p 



v 



> A + log 



D 



J 



log 



1-p 



The conclusion follows by using Lemma [3] and noting that E 

(0,oo). 

Proof of Prop. ^ This proof is a formal write-up of the heuristic presented before the statement 
of Prop, m Following the definition of ?7j and the fact that < ?7j < 1, we have 

j 

m=e* 

Suppose there exists an < L as defined in invoking Lemma [2] with the fact that A^* j < 
for all j > £*, we have 



1 k^oo 

- ^ log (1 + r]iXi) ^ a.s. and 



m mean. 



e=e* 



Thus, we have 



m mean. 



1 1 k 

- ^ log (1 + r](,xe) - - ^ log (1 + rjiXi) ^ a.s. and 

e=2 1=2 

The main contribution to (|7]) is now established via induction. Since ri2 = 1, we can expand 



the sum as (modulo the a.s. and in mean convergence parts): 



-5^1og(l + r/,x,)--log l + 5^n 

£=2 \ 1=2 m=2 



k^oo 



0. 



If = 2, it is clear that the proposition is true. If 3 < < L + 1, since 2 < i*, by the definition 
of i*, there exists (a smallest choice) j2 > 2 such that 



J2 

n 

m=2 
P 

n 

m=2 



Xr 



A;— +00 



oo with 



or 0(1) for all 2 < p < j2 - 1 



provided the set [2, ■ ■ ■ , j2 — 1] is not empty. There are two possibilities: j2 = — l or j2 < t — 1. 
(Note that 22 > ^* results in a contradiction since it will imply Y[m=e* ~^ 00, but we know 
this is not true from the definition of t). In the first case, we are done upon invoking Lemma [2l 
In the second case, iterating by replacing 2 with j2 + 1 (as many times as necessary) and finally 
invoking Lemma|2]and noting the main contribution of the sum in dV]), we arrive at the conclusion 
of the proposition. ■ 
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